A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees
نویسندگان
چکیده
Decision trees are among the most popular pattern types in data mining due to their intuitive representation. However, little attention has been given on the definition of measures of semantic similarity between decision trees. In this work, we present a general framework for similarity estimation that includes as special cases the estimation of semantic similarity between decision trees, as well as various forms of similarity estimation on classification datasets with respect to different probability distributions defined over the attribute-class space of the datasets. The similarity estimation is based on the partitions induced by the decision trees on the attribute space of the datasets. We use the framework in order to estimate the semantic similarity of decision trees induced from different subsamples of classification datasets; we evaluate its performance with respect to the empirical semantic similarity, which we estimate on the basis of independent hold-out test sets. The availability of similarity measures on decision trees opens a wide range of possibilities for meta-analysis and metamining of the data mining results.
منابع مشابه
A procedure for Web Service Selection Using WS-Policy Semantic Matching
In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...
متن کاملMining Accurate Shared Decision Trees from Microarray Gene Expression Data for Different Cancers
This paper studies the problem of mining shared decision trees across multiple application domains, including multiple microarray gene expression datasets for different cancers. Shared knowledge structures capture similarity between application domains and have many useful applications. Given two datasets with classes, we focus on shared decision trees that are highly accurate in both datasets ...
متن کاملAn improved similarity measure of generalized trapezoidal fuzzy numbers and its application in multi-attribute group decision making
Generalized trapezoidal fuzzy numbers (GTFNs) have been widely applied in uncertain decision-making problems. The similarity between GTFNs plays an important part in solving such problems, while there are some limitations in existing similarity measure methods. Thus, based on the cosine similarity, a novel similarity measure of GTFNs is developed which is combined with the concepts of geometric...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملComparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008